All result files are provided in UTF-8 encoded CSV format.



**Column descriptions:**



* Question: The text input entered into the model;



* Type: The question type;



* Picture\_ID: The corresponding picture name for the program to refer to; 



* Pic\_type: The type of pictures;



* Speaker: Who the speaker is, related to perspectives;



* Addressee: Who the addressee is, related to perspectives;



* Distance: Proximity—proximal or distal;



* Expert: Expert answers;



* Extracted\_Answer: Extracted answer from the model's output;



* Comparison: Comparison between the expert answer and the extracted answer for counting accuracy;
